Modeling Local Interest Points for Semantic Detection and Video Search at TRECVID 2006

نویسندگان

  • Yu-Gang Jiang
  • Xiaoyong Wei
  • Chong-Wah Ngo
  • Hung-Khoon Tan
  • Wanlei Zhao
  • Xiao Wu
چکیده

Local interest points (LIPs) and their features have been shown to obtain surprisingly good results in object detection and recognition. Its effectiveness and scalability, however, have not been seriously addressed in large-scale multimedia database, for instance TRECVID benchmark. The goal of our works is to investigate the role and performance of LIPs, when coupling with multi-modality features, for high-level feature extraction and automatic video search. In high-level feature extraction, we explore LIPs with both local description and spatial distribution for characterizing and sketching semantic concepts respectively. Two visual dictionaries, based upon universal visual keywords and concept-based visual keywords, are generated for experiments. The 39 concepts are learnt by SVM in vector space model with the support of both dictionaries. In addition, the distribution of LIPs is also exploited for detection with the multi-resolution and embedded Earth Mover’s Distance settings. We submit six runs by incorporating the two properties of LIPs with other modalities including grid-based color moment and wavelet texture. CityU-HK1: average fusion of 4 SVM classifiers using universal visual keywords, distribution of LIPs, grid based color moment, and wavelet texture. CityU-HK2: average fusion of 3 SVM classifiers using universal visual keywords, grid based color moment, and wavelet texture. CityU-HK3: average fusion of 3 SVM classifiers using distribution of LIPs, grid based color moment, and wavelet texture. CityU-HK4: grid based apriori mining method. CityU-HK5: average fusion of 3 SVM classifiers using concept-based visual keywords, grid based color moment, and wavelet texture. CityU-HK6: baseline method by average fusion of 2 SVM classifiers using grid based color moment, and wavelet texture. Results show that the LIP-based features could generate comparable results with traditional color/texture features. By incorporating the LIP-based features upon color moment and wavelet texture, an improvement of 51.4% is reported. In automatic search, we study the performance of query-by-example (QBE) and mini-ontology (39 concepts) on top of baseline text search. In QBE, the properties of LIPs are utilized as one of features for retrieval. In mini-ontology, we measure the similarity of query terms to 39 concepts and adopt various heuristic settings (Detailed in Section 3.4) to test its significance for search. We submit six runs, for all queries, to show the advantage of search with the mini-ontology as semantic filters, and compare its performance to the classical text search. CityU-HK1: multimodal automatic run using text search, mini-ontology with setting 1. CityU-HK2: multimodal automatic run using text search, mini-ontology with setting 3. CityU-HK3: multimodal automatic run using text search, QBE, and mini-ontology with setting 1. CityU-HK4: multimodal automatic run using text search, mini-ontology with setting 4. CityU-HK5: multimodal automatic run using text search, mini-ontology with setting 3. CityU-HK6: required baseline run using ASR/MT transcripts only.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IBM Research TRECVID-2010 Video Copy Detection and Multimedia Event Detection System

In this paper, we describe the system jointly developed by IBM Research and Columbia University for video copy detection and multimedia event detection applied to the TRECVID-2010 video retrieval benchmark. A. Content-Based Copy Detection: The focus of our copy detection system this year was fusing three types of complementary fingerprints: a keyframe-based color correlogram, SIFTogram (bag of ...

متن کامل

KB Video Retrieval at TRECVID 2010

This paper describes KB Video Retrieval's participation in the TREC Video Retrieval Evaluation for 2010. This year we submitted results for the Semantic Indexing, Known-item Search, Instance Search, and Event Detection in Internet Multimedia tasks. Our goal this year was to evaluate ranking strategies and expand our knowledge based approach to a variety of data sets and tasks.

متن کامل

Exploring Semantic Concept Using Local Invariant Features

This paper studies the role and performance of local invariant features arisen from interest points in describing and sketching semantic concepts. Both the local description and spatial location of interest points are exploited, separately and jointly, for concept-based retrieval. In concept description, a visual dictionary is generated with each keyframe being depicted as a vector of keywords....

متن کامل

Learned Lexicon-Driven Interactive Video Retrieval

We combine in this paper automatic learning of a large lexicon of semantic concepts with traditional video retrieval methods into a novel approach to narrow the semantic gap. The core of the proposed solution is formed by the automatic detection of an unprecedented lexicon of 101 concepts. From there, we explore the combination of query-by-concept, query-by-example, query-bykeyword, and user in...

متن کامل

VIREO at TRECVID 2010: Semantic Indexing, Known-Item Search, and Content-Based Copy Detection

This paper presents our approaches and the comparative analysis of our results for the three TRECVID 2010 tasks that we participated in: semantic indexing, known-item search and content-based copy detection. Semantic Indexing (SIN): Our main focus for the SIN task is on the study of the following two issues: 1) the effectiveness of concept detectors for indexing web video dataset, and 2) how to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006